Classifying noisy protein sequence data: a case study of immunoglobulin light chains

نویسندگان

  • Chenggang Yu
  • Nela Zavaljevski
  • Fred J. Stevens
  • Kelly Yackovich
  • Jaques Reifman
چکیده

SUMMARY The classification of protein sequences obtained from patients with various immunoglobulin-related conformational diseases may provide insight into structural correlates of pathogenicity. However, clinical data are very sparse and, in the case of antibody-related proteins, the collected sequences have large variability with only a small subset of variations relevant to the protein pathogenicity (function). On this basis, these sequences represent a model system for development of strategies to recognize the small subset of function-determining variations among the much larger number of primary structure diversifications introduced during evolution. Under such conditions, most protein classification algorithms have limited accuracy. To address this problem, we propose a support vector machine (SVM)-based classifier that combines sequence and 3D structural averaging information. Each amino acid in the sequence is represented by a set of six physicochemical properties: hydrophobicity, hydrophilicity, volume, surface area, bulkiness and refractivity. Each position in the sequence is described by the properties of the amino acid at that position and the properties of its neighbors in 3D space or in the sequence. A structure template is selected to determine neighbors in 3D space and a window size is used to determine the neighbors in the sequence. The test data consist of 209 proteins of human antibody immunoglobulin light chains, each represented by aligned sequences of 120 amino acids. The methodology is applied to the classification of protein sequences collected from patients with and without amyloidosis, and indicates that the proposed modified classifiers are more robust to sequence variability than standard SVM classifiers, improving classification error between 5 and 25% and sensitivity between 9 and 17%. The classification results might also suggest possible mechanisms for the propensity of immunoglobulin light chains to amyloid formation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تعیین اپی توپ های ناپیوسته زنجیره سبک ایمونوگلوبولین انسان توسط ایمونولوژی محاسبه ای

Background: Immunoglobulins are a group of proteins that have important role in defense against microorganisms. Immunoglobulins consist of heavy and light chains. In human, immunoglobulin light chain comprises of two isotypes: Kappa (K) and lambda (λ) based on amino acid differences in carboxylic end of their constant region. Marked changes in the K to λ ratio can happen in monocl...

متن کامل

Development and characterization of polyclonal antibody against human kappa light chain in rabbit

Polyclonal antibodies against kappa light chain are used to diagnose diseases producing free light chain. The kappa and lambda light chains are products of immunoglobulin synthesis and released into the circulation in minor amounts such as serum, cerebrospinal fluid, urine and synovial fluid in normal condition. The purpose of this study was the production and purification of polyclonal immunog...

متن کامل

Cutaneous amyloidosis as the first presentation of Waldenstrom macroglobulinemia

Background: Waldenstrom macroglobulinemia is a lymphoplasmacytic lymphoma with elevated serum immunoglobulin M and multi-organ involvement. Primary systemic amyloidosis usually develops due to immunoglobulin light chains depositions in different organs due to an underlying gammopathy. Case presentation: Our patient was an 86-year-old man with macroglossia, ecchymotic patches and bullous lesion...

متن کامل

Nucleic acid and protein sequences of phosphocholine-binding light chains

An 18-kilobase DNA fragment containing the sequence coding for both the variable and constant regions of the S107 mouse immunoglobulin light chain was cloned from total cellular DNA. The complete nucleotide sequence of the kappa-chain variable-region gene is reported. Determination of the amino acid sequence encoded by the DNA is found to be identical to the protein sequence of the T15 light ch...

متن کامل

Compatibility of B-Sheets with Epitopes Predicted by Immunoinformatic in Human IgG

Background & Aims: Antibodies, well-known as immunoglobulins (Igs), are produced by B lymphocytes and specifically defend against pathogens. Igs are glycoproteins and have high diagnostic value in several diseases including infections (1). Igs are composed of light and heavy chains (2, 3). Each chain is comprised of about 110-120 amino acid residues which create immunoglobulin folds named domai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2005